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A NEW ADAPTIVE CLASSIFliR USIlKi ITERATIVE FILTraiNG 
By Arland L« Actkinson 


1*0 smmr 


To cope with s immature variability, an algorithm has been defined which will 
adaptively classify remotely sensed data in the visible and near^infrared band. 
The signal is divided into a space-dependent coa^nent and a target-dependent 
component. The target-dependent component is assumed fixed across the image for 
each target type. The space-dependent component is estimated iteratively by a 
wei^ted, least-squares algorithm. Included in this study cure the derivations of 
the sensor model and the two-dimensional, estimation algorithm. 


2.0 INTRODUCTION: THE PROBLEM 


The classification of remotely sensed image data, using current techniques, 
is severely hindered by the problem of signal variability. Signal variability 
refers to the vast differences in signals radiated by a single crop in response 
to different environmental conditions. Differences in soil type, local temperature* 
water content of target, amount of haze and cloud cover, and many other factors can 
have a decisive effect on the spectral response. In addition, changes in the angle 
at which the target is observed affect the signal. 

One suggestion for dealing with signal variability is to use an adaptive 
classifier. An adaptive classifier alters the classification signatures in the 
course of an analysis. This usually means that after a resolution element is 
classified, the signature of the class is altered by averaging in the new observa- 
tion. The averaging- in process may be weighted or unweighted. By using this tech- 
nique, variations in signal can be partially modeled. 

However, this adaptive classifier method has some drawbacks. 

a. Errors in classification may cause a class signature to be "captured.” 

For example, as Class A elements are misclassified as Class B, Class B statistics 
begin to look more and more like Class A, making future misclassification even 
more likely. 

b. The weighting in the averaging is arbitrary; no criteria exist for this 
assignment, and, pres^imably , the values must be determined by experience. 

c. This method presupposes very good initial estimates of the signatures. 

d. This method restricts the taking of all ground truth to one part of the 
image, namely, the first part to be classified. 
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e* Few techniques account for variations due to angle of observation* 

f . Observations of one class yield no information about the variations of 
signals from other classes* Consequently « if a class is not homogeneously dis- 
tributed throu^out an image* the signature statistics may not realistically re- 
flect the environmental effects* 

As a result of these disadvantages* a new model would seem desirable* 


3.0 AHALYSIS* 


Many of the difficulties of current classification methods are the results of 
trying to estimate the current signature of each class* If the class signatures 
could be considered as functions of some vector representing the environment* then 
signature modeling could be handled with statistical filters similarly to the 
measurement modeling performed in Apollo navigation* This should eliminate the 
previously mentioned problems of using the averaging technique* 

A problem remains in implementing an environment estimation technique; namely, 
in order to model the signal, the class of the targe** being observed must be Known. 
This means that* if n classes of targets are observed* then n signal types are 
to be processed in estimating the environment vector* 

An analogous situation in navigation would be the following: Suppose several 

different types of measurements were being made (range* range rate, cingles* etc.)* 
but for each measurement, only the value was known* not the type. Before the ob- 
servation could be used to update position and velocity* the measurement type must 
be identified. This would be done by asking which type of neasurement was most 
reasonable* given the current estimate of position and velocity. The observation 
would be assumed to be this most reasonable type and would then be incorporated. 

This same procedure could be used in processing remotely sensed data* Thus* 

1. First* determine what a measurement of each type (example, each crop 
classification) would look like* given the current estimate of the environment; 
in other words* estimate ^he signattire of each class. If maximum likelihood is 
the classification criterion* then the mean and covariance matrix for each class 
should be determinea* 

2. Then* classify the measurement by using whatever classification criterion 
has been decided on. 

3* Finally* incorporate the observation to refine tne estimate of the environ- 
ment state* The equations for this procedure are given in the following sections. 
The classification criterion is maximum likelihood. 


^This analysis is gi^en in reference 1. 
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it.O SmiSOR HODEL^ 


The nodel of a signal for visible and near visible lig^t reflected from a 
target can be defined as 

S o liTP (1) 

\diere 

3 is the signal received 
L is the irradiance incident on the target 
R Is the reflectance of the target 

T is the transmittance of the atmosphere from the target to the sensor 

This model does not consider effects of the type ^ere incoming radiation is 
absorbed by the target and radiated back at a different frequency. Visible light 
is seldom radiated in this manner. Path radiance is also not considered. Path 
radiance is the signal received by the sensor which is not from the target; for 
example, radiation from other points on the ground or from the sun, which, b^ ai^e 
of the aerosols in the atmosphere, is reflected Into the sensor aperture. A^whou^ 
this effect is not as significant as the terms in equation (l), it can be impor* 
tant, and it is hoped that this effect will be added to the model at some future 
date. 

Reassociating the terms in equation (l) gives 

S « R(LT) 


or 


Y » R* ♦ C 


where 


Y ■ In S 
R* « In R 
C « In (LT) 

The signal Y, as veil as R* and C, Is a vector* Assume that the ith component 
of C has the form 


Ci « .i + bjU + c^V ♦ d^u2 ♦ CjUV + f^v2 

where U and V are the spatial coordinates of the target point. 


( 2 ) 


“This sensor model is tedien from reference 2 
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Hence, 


(R* ♦ a^). d^, 

^1 " 

• . 

. ^ 

U 

• • 

V 

(R* + a ), b , c , d , e , f 
n n n n n n n 

U2 


uv 




The R* Guid a vectors are combined In equation (3) since neither is dependent on 
the position of the target. This gives the final form of the signal model, namely. 


Y * p 4* 0 


(U) 


where 


Y is the signal received 
p * R* 4» a 

remembering the assumption that 

In (LT)j ■ aj + bjU + c^V ♦ + e^UV + 

Equation (U) will be the form used in this algorithm. Note that p is spatially 
independent; the difference between two p values is a function of target identity 
alone. On the other hand, 6 is spatially dependent and target independent. 


5.0 ESTIMATION ALGOBITTOI® 


A digital image can be thought of as a matrix of numbers. For example, if 
the i, Jth position is to be represented as dark, the number at that position 
would be small. The values of the matrix should be thougnt of as observations of 
a signal. If the signal is a linear function, or can be approximated as a inear 
function, the terms of the function can be estimated by least squares. If the 
image signal values are represented by the vector Y, then to say that the signal 
is a linear function of the vector G means there exists a matrix X such that 

Y « XG (5) 


®This derivation was taken from reference 3. 
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If X is knovn« then the best estimate of in the least squares senses is given 
by 

(5 « (x'^x)“*x'*’y (6) 

A sequential estimator seems an obvious choice; however 9 the choice of the sequence 
in which observations are processed is not obvious. The following ideas were used 
in selecting the sequence: 

1. The data set used to generate G must be well defined. 

2. The estimate of G and the computations used at an image point should be 
used in estimating the next image points. 

3 . Because modeling errors are alimys present, and because these errors 
probably increase as the distance from the estimation point increases, large Jumps 
in the sequence should be avoided. 

For these reasons 9 these choices were made: 

1. The estimate of G at the i,Jth point will be made by using the observa- 
tions to the left and above that point; that is, over the set 

{(x,y)|l < X < 1 , 1 _< y < j) ( 7 ) 

2. The estimate of G at (i-1, j) and at (i, j-l) will be used to find 
G at (i, j). It turns out the estimate at (i-1, J-l) is also required. 


^.1 Unweighted Estimation Algorithm 
The set in expression (7) can be partitioned into 
A = {(x,y) ! 1 ^x < i, 1 ^y < j} 

B * ( (x,y) I iix< i,y*j} 

C - {(x,y) I X = u, 1 ^ y < J} 

D « {(i,j)} 

Clearly, the union of A, B, C, and D is equal to the set in expression (T), and no 
point in one cf the partition sets is in any of the other sets. 

For each partition, there exists a of equations of the form of equation (l) 

the number of equations in the set is equal to the number of points in the partition 
Hence, 
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( 8 ) 


where the elenents in one of the are the signals in set i » A, B, C, or D 
and the rows of the matrix are the coefficients of the elements of G. Re- 
calling equation (6), the best estimate of G, using all of the observations in 
sets A, B, C, wd D, is 







T T T T " 

Va ^ Vb ^ Vc " Vd 


(9) 


Both of the terms in equation (9) use quantities which will be inconvenient to de- 
fine, namely, X^, X^, Y^, Y^, and Y^. Fortunately, equation (9) can be rewritten 

Using X.^ to stand for X in equation (9) over the set A U B, 

AJ5 


*Ib*ab 


T 

XaX. 

A A 


*B*B 


Similarly, 


fee 


T T 

Va " 


Likewise, ^AC* ^A denote the best estimates of G, using the obser 

vations in A U B, A U C, and A, respectively. Hence, the inverse term in 
equation (9) becomes 
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[*X * ’^*8 * ‘ * <*1*, 


1-1 


♦ xix„ ) - xfx. + 


A C"C' A“A 




[»L*AB * '^C^AC - *A*A * ’^■‘o] 


The second factor in equation (9) also neeus to be rewritten. Recall that 

‘IVa ■ “a'a 

*Ib*ab“ab ■ ^8 ■ "I'a * '^’'a 

*Ic Vac ' "L’ac " ■‘^A * 

Hence 9 the second term becomes 

fpfpiTifp *TT TT TT 

Va ♦ Vb • Vc * Vd ■ <Va * %> ♦ <Va * 'c'c' - Va * Vd 


• Wab°ab * VA - Wa * 'S'd 

Substituting these rewritten terms into equation (9) give 

1-1 


r T T T T 

-f + xl,; 

AB AB AC AC A A D 


i»d] [*I 


AB*AB®AB * *AC*AcSc " * 


<hl 


(10) 


Equation (10) is the form that will be used in determining G, assuming Y * XG, 
where X is some matrix and the equations in the s>stem are to be equally weighted. 


3.2 Weighted Estimation Algorithm 

^ In equation (lO) each observation has exactly the same importance in calculating 
G. This is reasonable if all the observations are equally good. However^ more dis- 
tant pixels would be expected to be less useful than closer pixels. Consequently, 
distant measurements should be downweighted. 

Suppose G is to be estimated at some point image. 

Select a number r such that 0 < r < 1. If d(i,n) is the distance, by whatever 

definition, between image points P. and P , then the observation at P may be 
. / , \ in 1 

weighted by r^^ . Using this weighting, the greater the distance of an obser- 

vation from P^, the smaller the weight, until finally the very distant observations 

are, effectively, thrown away. 
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P 


1 


The remaining task is to define the distance measure d(i,n); Suppose 
(U^*V^) and Pg « points in the image. Define 


d(l,2) . |U^ - Ugl + |V^ - Vgl (11) 

The convenience of this distance measure will become apparent. 

Equation (5)> at point P^ = ^^n’^n^* becomes 


»(U^,V„)Y . W(U^.V„)XG 


( 12 ) 


where 


d(n,l) 


W(U^,V ) 
n n 


0 


yd(n,n) 


(13) 


Here diagonal matrix with the general term 

The best estimate of G in equation (12) is 

G *= [x‘^w(Ujj,v^)2x]-^x'^w(u^,v^)2y 

Notice that each diagonal term of W(U,V) changes as the G estimation point 
changes. Suppose G is estimated at P^. Then the ith diagonal term of 

W(U^,V^) is other words, is the weight of the ith obser- 

vation, when estimating G at point P^. Suppose G is now to be estimated at 
another point Q which is a distance of d(i,n) + 1 from P^* The the appropriate 

weight for the ith observation in computing G at point Q is Hence, 

the ith diagonal element of the weighting matrix at Q is r times the ith 
diagonal element of the weighting matrix at P^. 


Suppose G is to be estimated at (U,V). Further, suppose 0 has been 
estimated at (U,V-l), (U-1,V), and (U-l,V-l), which means that weighting matrices 
have been defined at these points also. Use the partitioning described in the un- 
weighted case, 

A*{(x,y) 1 l<_x<U, l^y<V} 

B « {(x,y) I 1 1 X < U, y « V} 

C « {(x,y) I X » u, 1 y < V) 

D * {(U,V)} 
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The estimates of G at (U-1, V-1), (U-1, V), and (U, V-l), respectively, are 
G at (U-1, V-l) = V-1 )*a] ^I'^fu-l, V-l)^A 

G at (U-1, V) . [x>fu-i. v)*Ab] 4«?U-l.V)’fAB 

G at (U, V-l) » V-l)*Ac] *Ac”(U, V-l)^AC (lU) 

where Y^, and Y^^ are the observations from the sets A, A U B, and A U C, 

respectively. 

Denote by R the matrix 


R 



0 


0 


rj 


where r is the weighting scalar mentioned previously. The size of R will be 
implicitly defined by the equation in which it appears. 

Recall that V-l)’ '^(U-1 V)’ ^(U V-l) weights for the 

elements in Y^^, and Y^^, respectively. Partitioning and 

«(U, v-l) 


and 


'^(U-l, V) 


(U-1, v)^^^ j ° 

r 

° * ”(U-1, v)^®^ 

I -* 


“(U, v-l) 



U 


0 


,(A) I , 

I ”(U, V-l)^^|^ 
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where ^(U-1 diagonal wei^ting matrices for 

Y^ and respectively, defined at (U-1, V), and W^y ^ ^ 'A) and 
W(u diagonal weighting matrices for and Y^, respectively, 

defined at (U, V-l). 

Note that the distance from (U-1, V-l) to (U, V) is 2* Furthermore, the 
distance from (U, V) to any point in A is 2 greater than the distance d(i, n) 
from (U*l, V*l) to that point in A. Hence, the weight for an observation at 

any point in A is Notice that is the general diagonal 

term of V-l)^^^ “ ^(U-1 V-l)‘ means 


"(U, V)<*> 




(15) 


Similarly « because the distance from (U, V) to a point in AUB or AUC 

is 1 greater than the distance from (U-1, V) or (U, V-l), respectively, to 


'^(U, * **'^(U, V-l)^^^ “ ^’^(U-1, 

“(U, * *™(U. v-l)* 

V Vj'^' V)‘=> 



Now the machinery is gathered for defining G at (U, V), Recall that the best 
estimate of G at (U, V) is 
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• [’‘^(u. v)<*«* * '§'(0. y)<"N * Y)<'”'c 

♦ *)<“»S)]’'fs[“fg. y)'*'". ' ❖fg, y)<"'’'B 

‘ «J»fg. yjlcl'c * '^''fg, yj'^'^g] 

Recall that 

*L"?g. y)**** ■ *I»(g, y)<»'»» * *B"(g, yj'^’S 

• y)<*>l*, ♦ »Jl“"“(g-i. y)<®"*B 

■ y)‘*”‘A » »"'^"(g.i, y)'“>*B 

Similar.\y« 

‘Ic'w. v)<* “ '>*»c • *I‘''"''(g, y-i)“>'’‘A * *?>'*'’'(g, y.i)«>"'c 
* »'*^(g. y-i)**’^ ’ "'*j"(g. y-i)*‘'''‘c 

Also, 

*I"(g, y)’*'^ • *I>''^fg-i. y-i)<*>>*A 

■ ""’‘IVi. y-i)“>*A 
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How the inverse fiactor of equation (IT) can be rewritten as 

♦ ❖"“(U, V-D^'^A * V-l)<'>‘c 

- V)<*'*A * ’^''?U. V)»»«I. 

' "'[’'I'fu. V-1)<‘>='A • v-d'^'^c] 

- *y"(U-l. V.1)‘*'*A * ^“(0. V)“»S 

' j^*AB“(U-l , V)^* ^ * ^Ac'^fu, V-1)^* 

Now, the .^rst factor of equation (IT) is in tezms of matrices that were defined 
at (U, V-1), (U-1, V), or (U-1, V-l) pliis the new term defined at (U, V), This 
is the form of the inverse term which will be used. 

All that remains is to rewrite the second factor in equation (17)- We know 

v-i)'*»a ■ v-d'^'^IJ^a 

*L"(U-1, v)<* “ ®>"aB " [’‘Ib“(U-1, V)<* U '’■‘LpAB 

*Ic"(u, V-1)“ « ‘^'’'ac ■ [’‘L"(b, v-1)'* X '=>4]“ac 

where G. , G.-, and G.^ are the best estimates of G over A, A U B, and AU C 
A AB ’ AC 

at (U-1, V-l), (U, V-l), and (U-1, V), respectively. 
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Adjusting the first equation to change the wei^ting aatrix reference point 
to (U, V), 

*a[*’'*“(U-1, V-1)^*^]^A " ®'*[*a''(U- 1, V-l)^^^*ApA 

Slmilarly» 

»Ib"?u. V)'* »» “>'ab ' '"*AB''fu-i, v)<* “ “'Was 

and 


»L"(U. V)“ “ '>’'aC ■ »-!)<* “ ‘^’VaC 

The second factor in (13) can now be rewritten 


■ [^"!u. V)<*”'a ’ 4"(U, 


+ xlwf 


(U, 


" "Iclu. V)<A ^ 

* V(U, 

■ ^ ®^*AB^AB \c'^(U, V-l)^^ ^ ^^^Ac'^Acj 

■ '’“(^a''' -i, V-1 )^*^Va] * 4”fu, 


(19) 



lU 


Substituting (l8) and (19) into (17)* the best estimate of 0 at (U* V), in 
the weighted least squares sense* is 

8 ■ »)<* “ »«AB * "Liu. « '>»« j- *-1)'*“*] 

* V)** “ ****»°»» * 4c'^», T-1)** •* j 

6a0 JMPi^aamnw 

To complete the definition of the adaptive classifier « the method of hmdling 
the training data will be defined* Finally* the procedure for classification 
described in section 3*0 will be described* 


6*1 Training Data Calculaticms 

First* the signatures of the classification sets must be estiauited* This 
should be accomplished by evaluating the equations 


P 


1 


where 



fp “I »p 

(rH) H^Y 


• the log of the reflectemce of the ith class plus a 

b, c, d, e, f • the coefficients of 0 
Y * the vector of the observations 

H s the matrix over the training data such that the Jth line is 


( 21 ) 
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wh.?re 

I » a row vector with all zeroes except for a one in the ith location* 
^ indicating the Jth observation came from the ith class 

(Uj,Vj) * the position, in U,V coordinates, of the Jth observation 
Equations (3) and (6) show that equation (21) is a solution of 


P 

1 


c 

d 


e 




The solution of equation (2l) will yield an estimate of the class reflectances 
and the global estimate of the environment function 0. In order to use a maximum 
liltelihood classifier^ as described, for example, in reference U, the covariance 
matrices of tne classes :ire also needed. These may be e.'timatr(i by 


where 


s TT 


C « the covariance matrix of the ith class 

® expected value of * taken over the ith class 
i * the log cl the received signal 

6 * the estimate of the environment; i.e., bU cV + dU*" ♦ cUV + fV^- 
- the log of the 'werage reflectance of the ith cIass plus a 

Finally, the best local estimate of the environment vector must be made at 
point (1,1 ), since that is the point where classification will be. The best local 
estimate at (1,1) will be a solution of 
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WiYj » W.tpj,l.Uj,VjU|,UjVj.V|l 


^1 

b 

c 

d 

e 

f 


( 22 ) 


where 

{u. ♦ V - 2) 

Wj ■ the wei^t of the ith observation (eqwa to 6 ,0<®^l) 

Y. « the ith observation (scalar) for the channel being considered 

* the logarithm of the reflectance of the object of the ith observation plus a 
(Ui^V.) « the coordinates of the ith observation 

[a,...,f] * the vector of coefficients of the log environment polynomial; 

that i. , 0 * a ♦ bU ♦ cV ♦ dU^ ♦ eUV ♦ fV^ 

The solution of equation (22) is 


where 


and 


Sa" 

b 

c 

d 


(k'^w^k) 




e 


0 


6 


U+V-2 


(23) 


P ^ *1 j 
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A couple of coBBz^nts are necessary here* First* in general^ r < 0 * where r 

is defined in section ^.2. This means that in estiioating the environment using 
training data* the old measurements are less downweighted than they would be in 
the normal* weif^ted estimation procedure. This reflects the greater security 
resulting from Knowing the classification of the data. Second* for the local esti- 
mates of the environment* a constant term will be used in the polynomial. The 
globally defined polynomial also has a constant term* but the terra is combined with 
the reflectance. 


6.2 Test Data Calculations 

The calculations to be performed on the data to be classified are 
1* Estimate the environment state from prior observations. 

2. Classify the observation. 

3. Update the estimate of the environment. 

The equations to perform these functions are as follows: 

Suppose the observation at coordinates (U* V) is to be processed. Denote 
by the best estimate of the environment state vector at (U, V). Then, 


G- . °(U-1. V) * ^(U. V-1) 
(U, V) 2 


i2k) 


The minus sign indicates that the observation at (U, V) has not yet been used in 
the estimate of G. 

The estimate is determined from the training data only. In the 

event that U = 1, then G^^ * G^^ V-1)* where V * 1* then 

^(U, V) “ V)* 

Now that G has been estimated, 6 can be calculated where 


9 + a = [l,U,V,u2,UV,v2]G"y^ 

Then the number p may be estimated by 

p * Y - e 

where Y is the log of the observed signal. 


(25) 


(26) 
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Finally, the likelihood of each class must be conq[>uted. 

L(l|p) = -Indj) - (p . . Pi). (27) 


whore 


L(i|p) s the likelihood of the observation p belonging to the 1th class 
=* the mean of the ith class 
« the covariance matrix of the ith class 
The observation should be classified into the class i such that 


L(i| p) > L(j1p), iitj 


In the case where L(i[p) = L(j|p), some arbitrary choice between i and J must 
be made. 

Now that the classification has been established, the observation may be used 
to update the estimate of G. This is done by means of equation (20). A descriptive 
flow chart of the estimation algorithm is given in figure 1. Details are given 
in figure 2. 


7.0 CONCLUSION 


Ti c.pe with signature 'variability, an algorithm has been defined which will 
aaapt ivf'ly classify remotely sensed data in the visible and near-infrared band. 

’>* signal is divi led- int ^ a space-dependent component and a target-dependent 
' nf^n* . The turget-ltjr endent component is assumed fixed across the image for 
target type. The space-dependent component is estimated iteratively by a 
wc^ghtei, least-squares algorithm. Included in the study was the derivation of 
lue st-nsor model and two-dimensional, estimation algorithm. 



V) 


*{24) 


*(27) 


*( 20 ) 



* NUMBERS IN PARENTHESE REFER 
TO EQUATIONS IN THE TEXT 


Figure 1.- Descriptive flow chart of estimation alqorithn. 
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He NUMBERS IN PARENTHESES REFER 
TO EQUATIONS IN THE TEXT 
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Figure 2.- Octailed flow chart of estimation algorithm. 




'‘( 1 . 1 ) 

BEEN DEFINED 
AS ABOVE 
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